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Amendments to the Claims 
This listing of claims will replace all prior versions of claims in the application: 
Listing of Claims: 

1 . (Currently Amended) A data analysis system, comprising: 

a first component associated with a server of the data analysis system that facilitates 
generation of a first data set related to web page information obtained via a communication 
system; and 

a second component that coordinates a second data set relating to web page information 
from at least one distributed resource associated with at least a client of the server which 
interacts with the communication system; the second data set is utilized to refine the first data 
set , wherein refining the first data set comprises adding unknown information to the first data set 
when new information is received from the distributed source via the second data set or updating 
existing information in the first data set when changes have occurred in the contents of the web 
page information as indicated by the second data set . 

2. (Original) The system of claim 1, the first component comprising an internet 
web crawler. 

3. (Original) The system of claim 1, the first component comprising an intranet 
web crawler. 

4. (Original) The system of claim 1, the second component further utilized to 
optimize reception of data from the distributed resources. 

5. (Original) The system of claim 1, the second component provides a 
scheduling function to control reception of the second data set from the at least one 
distributed resource. 
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6. (Original) The system of claim 1, the second component utilized to facilitate 
communication traffic reduction via the communication system by employing a proper set of 
weak indicator functions representative of the first data set. 

7. (Original) The system of claim 6, the second component further utilized to 
randomly select and transmit a weak indicator function selected from the proper set of weak 
indicator functions to at least one of the distributed resources. 

8. (Original) The system of claim 1, the second component further utilized to 
compare the first data set and the second data set to detect spoof data retrieved by the first 
component. 

9. (Original) The system of claim 1, the second component further utilized to 
generate status information about data related to the first data set; the status information 
transmitted to at least one distributed resource. 

10. (Original) The system of claim 9, the status information comprising, at least in 
part, a freshness flag to indicate freshness of information related to the first data set. 

1 1. (Original) The system of claim 9, the status information comprising, at least in 
part, a hash of contents of information related to the first data set. 

12. (Original) The system of claim 9, the status information comprising, at least in 
part, a copy of information of the first data set. 

13. (Original) The system of claim 1, the communication system comprising an 
internet. 

14. (Original) The system of claim 1, the communication system comprising a 
world wide web. 
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1 5 . (Original) The system of claim 1 , the communication system comprising an 
intranet. 



16. (Original) The system of claim 15, the intranet comprising a local area 
network. 



17. (Original) The system of claim 15, the intranet comprising a wide area 
network. 



18. (Cancelled) 



19. (Original) The system of claim 1, the distributed resources comprising trusted 
entities interactive with the communication system and the second component. 



20. (Original) The system of claim 1, the first data set comprising internet web 
page data. 



21 . (Original) The system of claim 1 , the first data set comprising intranet web 
page data. 



22. (Cancelled) 



23. (Original) The system of claim 1, the second data set comprising, at least in 
part, a hash of contents of at least one web page. 



24. (Original) The system of claim 1, the second data set comprising, at least in 
part, a Uniform Resource Locator (URL) of at least one web page. 



25. (Original) The system of claim 1, the second data set comprising, at least in 
part, a time stamp relating to an acquisition time for information about at least one web page. 
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26. (Currently Amended) The system of claim 1, the second data set comprising, 
at least in part, a delta indication of the changes to contents of the at least one web page. 

27. (Original) The system of claim 26, the delta indication including, at least in 
part, a hash of previous contents of a web page and a hash of recent contents of the web page. 

28. (Original) The system of claim 1, the second data set comprising, at least in 
part, a status indication of changes to contents of at least one web page. 

29. (Original) The system of claim 28, the status indication including, at least in 
part, a percentage relating to an amount of change of contents of a web page. 

30. (Original) The system of claim 28, the status indication including, at least in 
part, a significance indicator to signify importance of changes in contents of a web page. 

31. (Original) The system of claim 1, the second data set comprising internet web 
page data. 

32. (Original) The system of claim 1, the second data set comprising intranet web 
page data. 

33. (Original) The system of claim 1, the second data set comprising data 
compiled utilizing at least one weak indicator function randomly selected from a set of weak 
indicator functions; the set of weak indicator functions representative of the first data set. 

34. (Original) The system of claim 1, further comprising a search component to 
accept at least one search query and generate at least one search reply having at least a 
portion of the first data set represented by information embedded in the search reply. 
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35. (Original) The system of claim 1, further comprising a web page server 
component to construct web pages having at least a portion of the first data set represented by 
information embedded in at least one link found on at least one constructed web page. 

36. (Original) The system of claim 1, further comprising a storage component to 
store the first data set. 

37. (Currently Amended) A method for facilitating data analysis, comprising: 
generating a first data set relating to a second data set obtained from web pages 

interactive with a server of a communication system; 

receiving a third data set from at least one distributed resource comprising a client of the 
server that is interactive with the communication system; the third data set comprising web page 
related information generated by the distributed resource; and 

refining the second data set to reflect information obtained from the third data set , by: 

adding unknown information to the second data set when new information is received 
from the distributed source via the third data set; 

updating existing information in the second data set when changes have occurred as 
indicated by the third data set; and 

passing status information to the distributed resource through one or more indicators after 
information from the third data set has been analyzed. 

38. (Original) The method of claim 37, the first data set comprising a 
representation of the second data set. 

39. (Original) The method of claim 38, the representation of the second data set 
comprising, at least in part, a hash of contents of at least one web page contained in the 
second data set. 

40. (Original) The method of claim 38, the representation of the second data set 
comprising, at least in part, a status indication of at least one web page contained in the 
second data set. 
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41 . (Original) The method of claim 40, the status indication comprising a 
freshness flag to indicate if the web page information is current. 

42. (Original) The method of claim 37, the first data set comprising a copy of the 
second data set. 

43. (Original) The method of claim 37, the second data set comprising web page 
information compiled by a web crawler. 

44. (Original) The method of claim 37, the third data set comprising web page 
information based upon client accessed web page information on the communication system. 

45. (Cancelled) 

46. (Original) The method of claim 37, the communication system comprising an 
internet. 

47. (Original) The method of claim 37, the communication system comprising an 
intranet. 

48. (Cancelled) 

49. (Original) The method of claim 37, further including: 

transmitting the first data set to at least one distributed resource that is interactive with 
the communication system making the first data set available to be utilized by the distributed 
resource to generate the third data set. 

50. (Original) The method of claim 38, further including: 

generating a set of weak indicator functions to represent the second data set; and 
selecting random weak indicator functions from the set of weak indicator functions to 
transmit to the distributed resources as the first data set. 
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51 . (Original) The method of claim 50, the set of weak indicator functions 
comprising a proper set of weak indicator functions such that a non-zero probability exists 
that a randomly selected weak indicator function can identify a new web page. 

52. (Original) The method of claim 50, generating a set of weak indicator 
functions comprising: 

providing a dictionary representative of the second data set; 
partitioning randomly the dictionary into non-overlapping subdictionaries; and 
creating a function where I(x) = 1 if and only if at least one subdictionary's weak 
indicator function is equal to one. 

53. (Original) The method of claim 37, further including: 

comparing the third data set to the second data set to reveal spoof data included in the 
second data set. 

54. (Original) The method of claim 37, further including: 

optimizing reception of at least one third data set through scheduling of the distributed 
resources. 

55. (Original) The method of claim 37, further including: 
receiving a web page search query from at least one distributed resource; 
generating a web search results page in response to the web page search query from the 

distributed resource; 

embedding portions of the first data set in links found on the web search results page; and 
transmitting the web search results page as a representation of at least a portion of the 
second data set to the distributed resource. 

56. (Original) The method of claim 37, further including: 
constructing a web page utilizing at least a portion of the first data set to embed 

information about links found in the web page; and 

transmitting the web page to disseminate the first data set to at least one distributed 
resource. 
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57. (Currently Amended) A data analysis system, comprising: 

means for generating at least one first data set from a server of communication system; 

means for receiving and coordinating at least one second data set from at least one 
distributed resource client which interacts with the server of the communication system; and 

means for refining the first data set utilizing at least one second data se t, wherein refining 
the first data set comprises the at least one of adding unknown information to the first data set 
when new information is received from the client via the second data set and updating existing 
information in the first data set when changes have occurred in the web page as indicated by the 
second data set . 

58. (Original) The system of claim 57, the means for generating at least one first 
data set including a web crawler. 

59. (Original) The system of claim 58, the first data set comprising data relating to 
web pages obtained by the web crawler. 

60. (Currently Amended) The system of claim 57, the second data set comprising 
web page comparison data compiled by the at least one client distributed resource and based, 
at least in part, upon representative data of the first data set. 

61 . (Currently Amended) A data analysis system, comprising: 

a first component associated with at least one client of a distributed web crawling system 
that generates web page information from at least one visited web site for utilization in [[a]] 
the distributed web crawling system; and the web page information transmitted by the first 
component to 

a second component associated with a server that receives the web page information 
transmitted by the first component via a communication syste m, wherein the first component 
receives a set of data from the second component to utilize in the generation of the web page 
information comprising at least comparison data based on the visited web page and the 
received set of data. 
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62. (Original) The system of claim 61, the first component providing at least one 
time stamp relevant to a time of acquisition of data utilized in the generation of the web page 
information. 

63. (Original) The system of claim 61, the first component receiving a set of 
embedded web crawler data from at least one search result page to utilize in the generation of 
the web page information. 

64. (Original) The system of claim 61, the first component receiving a set of 
embedded web crawler data from at least one web page to utilize in the generation of the web 
page information. 

65. (Currently Amended) The system of claim 61, the first component further 
operational to obtain web page data indirectly via at least one other client of the distributed 
crawler system to provide a gateway to [[a]] the second component to substantially reduce 
traffic flow to the second component. 

66. (Cancelled) 

67. (Original) The system of claim 61 , the generated web page information 
comprising, at least in part, a status indication of changes to contents of at least one web 
page. 

68. (Original) The system of claim 67, the status indication including, at least in 
part, a percentage relating to an amount of change of contents of a web page. 

69. (Original) The system of claim 67, the status indication including, at least in 
part, a significance indicator to signify importance of changes in contents of a web page. 
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70. (Original) The system of claim 61 , at least a portion of the generated web 
page information made available for peer-to-peer client transmission via the communication 
system. 

71 . (Original) The system of claim 61 , the generated web page information 
compiled utilizing a randomly selected weak indicator function from a proper set of weak 
indicator functions that represent web page data compiled by a web crawler. 

72. (Original) The system of claim 61, the communication system comprising an 
internet. 

73. (Original) The system of claim 61, the communication system comprising an 
intranet. 

74. (Original) The system of claim 61, further comprising a storage component to 
store the web page information. 

75. (Original) The system of claim 61 , further comprising a notification 
component that determines when and if the generated web page information is to be 
communicated via the communication system. 

76. (Currently Amended) The system of claim 75, the notification component 
receiving scheduling information from [[a]] the second component; the scheduling 
information relating to obtaining and transmitting the generated web page information. 

77. (Cancelled) 

78. (Currently Amended) The system of claim 61 77, the first component utilizing 
web search servers outside of the distributed web crawling system to retrieve data unknown 
to the second component. 
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79. (Currently Amended) The system of claim 61 77, the first component 
generates comparison data based on the web page information and the received set of data ; 
the first component making the comparison data discretionarily available to the second 
component via the communication system. 

80. (Currently Amended) The system of claim 61 79, the comparison data 
including, at least in part, at least one Uniform Resource Locator (URL) of at least one web 
page. 

81 . (Currently Amended) The system of claim 61 79, the comparison data 
including, at least in part, a hash of contents of at least one web page representative of a 
recent web site visit. 

82. (Currently Amended) The system of claim 61 79, the comparison data 
including, at least in part, a delta indication of contents of at least one web page. 

83. (Original) The system of claim 82, the delta indication including, at least in 
part, a hash of previous contents of a web page and a hash of recent contents of the web page. 

84. (Currently Amended) The system of claim 61 77, the second component 
comprising a server of the distributed crawling system. 

85. (Currently Amended) The system of claim 77, the second component 
comprising a client of the distributed crawling system. 

86. (Currently Amended) The system of claim 61 77, the generated web page 
information comprising data unknown to the second component. 

87. (Currently Amended) The system of claim 61 77, at least a portion of the 
received set of data made available for peer-to-peer client transmission via the 
communication system. 
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88. (Currently Amended) The system of claim 61 77, the received set of data 
comprising a dictionary for data compiled by a web crawler. 

89. (Currently Amended ) The system of claim 61 77^ the received set of data 
comprising a representation of data compiled by a web crawler; the representation of data 
generated by utilizing a weak indicator function. 

90. (Currently Amended) The system of claim 61 77, the received set of data 
comprising a copy of data compiled by a web crawler. 

91 . (Currently Amended ) The system of claim 6J_ 77, further comprising a 
storage component to store the set of data received from the second component. 

92. (Currently Amended) A method for facilitating data analysis, comprising: 
compiling a first data set derived from accessing web pages via a client of a 

communication system; and 

transmitting, selectively, the first data set to an entity comprising at least a server of a 
distributed crawling system that is interactive with the communication system; 

receiving a representation of a second data set compiled by the server of the web crawler; 
the second data set relating to at least one web page from the communication system; and 

utilizing the second data set to control which web pages to visit to compile the first data 

set. 

93. (Cancelled) 

94. (Cancelled) 

95. (Original) The method of claim 92, the first data set comprising, at least in 
part, a uniform resource locator (URL) for at least one web page. 
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96. (Original) The method of claim 92, the first data set comprising, at least in 
part, a hash of contents of at least one web page. 

97. (Original) The method of claim 92, selectively transmitting based upon time 

of day. 

98. (Original) The method of claim 92, selectively transmitting based upon 
priority of at least one web page. 

99. (Original) The method of claim 92, selectively transmitting based upon 
percentage of content change of at least one web page. 

100. (Original) The method of claim 92, selectively transmitting based upon 
identifying at least one new web page. 

101. (Cancelled) 

102. (Currently Amended) The method of claim 92 +04-, receiving the 
representation of the second data set is accomplished via reception of a web page with 
embedded information derived from the second data set and generated by a web page hosting 
server with access to the second data set. 

1 03 . (Currently Amended) The method of claim 92 -1-04-, receiving the 
representation of the second data set is accomplished via reception of a search results page 
with embedded information derived from the second data set and generated in response to a 
query transmitted to a search server having access to the second data set. 

104. (Cancelled) 
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105. (Currently Amended) The method of claim 92 further comprising: 
determining when to transmit the first data set via the communication system based upon 

the second data set. 

106. (Original) The method of claim 105, the second data set containing a 
freshness indicator to indicate when its data is stale and requires updating via the first data 
set. 

107. (Original) The method of claim 105, the second data set containing a schedule 
for when the first data set is to be transmitted. 

108. (Currently Amended) The method of claim 92 further comprising: 
comparing at least a portion of the second data set with at least a portion of information 

obtained via accessing web pages to create comparison data; and 

generating a representation of the comparison data to derive the first data set. 

109. (Original) The method of claim 108, the first data set comprising data 
unknown to the second data set. 

110. (Original) The method of claim 109, the unknown data comprising only 
unknown data derived from at least one search results page from a search server outside of 
the distributed crawling system. 

111. (Original) The method of claim 108, the first data set comprising content 
changes to web pages represented by the second data set. 

1 12. (Original) The method of claim 108, the first data set comprising status 
information relating to web pages represented by the second data set. 

113. (Cancelled) 



15 



10/670,681 



MS305080.01/MSFTP475US 



114. (Currently Amended) A computer readable medium having stored thereon 
computer executable components comprising: of the system of claim 1 [[.]] 

a first component associated with a server of the data analysis system that 
facilitates generation of a first data set related to web page information obtained via a 
communication system; and 

a second component that coordinates a second data set relating to web page 
information from at least one distributed resource associated with at least a client of the server 
which interacts with the communication system; 

the second data set is utilized to refine the first data set, wherein refining the first data set 
comprises adding unknown information to the first data set when new information is received 
from the distributed source via the second data set and updating existing information in the first 
data set when changes have occurred in the contents of the web page information as indicated by 
the second data set. 

115. (Original) A device employing the method of claim 37 comprising at least one 
selected from the group consisting of a computer, a server, and a handheld electronic device. 

116. (Original) A device employing the system of claim 1 comprising at least one 
selected from the group consisting of a computer, a server, and a handheld electronic device. 
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